PaliGemma/[PaliGemma_2]Using_with_Transformersjs.ipynb (302 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "BVp8vazXYOz-" }, "source": [ "##### Copyright 2024 Google LLC." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "DBaXaQ_PYT4p" }, "outputs": [], "source": [ "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "XSkl5h3dZMo9" }, "source": [ "# PaliGemma2 - Run with Transformers.js\n", "\n", "Author: Sitam Meur\n", "\n", "* GitHub: [github.com/sitamgithub-MSIT](https://github.com/sitamgithub-MSIT/)\n", "* X: [@sitammeur](https://x.com/sitammeur)\n", "\n", "Description: This notebook demonstrates how you can run inference on PaliGemma2 model using Node.js and [Transformers.js](https://huggingface.co/docs/transformers.js/index). Transformers.js lets you run Hugging Face's transformer models directly in browser, offering a JavaScript API similar to Python's. It supports NLP, computer vision, audio, and multimodal tasks using ONNX Runtime and allows easy conversion of PyTorch, TensorFlow, and JAX models.\n", "\n", "<table align=\"left\">\n", " <td>\n", " <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/PaliGemma/[PaliGemma_2]Using_with_Transformersjs.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", " </td>\n", "</table>" ] }, { "cell_type": "markdown", "metadata": { "id": "Ftkrrn3aZyAl" }, "source": [ "## Setup\n", "\n", "### Select the Colab runtime\n", "To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the PaliGemma 2 model. In this case, you can use CPU runtime:\n", "\n", "1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.\n", "2. Select **Change runtime type**.\n", "3. Under **Hardware accelerator**, select **CPU**." ] }, { "cell_type": "markdown", "metadata": { "id": "eCJ7yo3-Zzdj" }, "source": [ "## Installation" ] }, { "cell_type": "markdown", "metadata": { "id": "ET_KH77YZ5lc" }, "source": [ "Let's get started with installing the dependencies." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d7Ds4h2q0ItU" }, "outputs": [], "source": [ "# Install Node.js\n", "!curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -\n", "!sudo apt-get install -y nodejs" ] }, { "cell_type": "markdown", "metadata": { "id": "ObI8d_Rwa_nn" }, "source": [ "## Create Node.js project\n", "\n", "Create a new Node.js project and install the required transformers package via [NPM](https://www.npmjs.com/package/@huggingface/transformers)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5nmPcg5J0cYj" }, "outputs": [], "source": [ "# Create project directory\n", "!mkdir paligemma2-node\n", "%cd paligemma2-node\n", "\n", "# Initialize NPM project\n", "!npm init -y\n", "!npm i @huggingface/transformers" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gGYVoOXA3T7-" }, "outputs": [], "source": [ "%%writefile package.json\n", "\n", "{\n", " \"name\": \"paligemma2-node\",\n", " \"version\": \"1.0.0\",\n", " \"main\": \"index.js\",\n", " \"type\": \"module\",\n", " \"scripts\": {\n", " \"test\": \"echo \\\"Error: no test specified\\\" && exit 1\"\n", " },\n", " \"keywords\": [],\n", " \"author\": \"\",\n", " \"license\": \"ISC\",\n", " \"description\": \"\",\n", " \"dependencies\": {\n", " \"@huggingface/transformers\": \"^3.1.2\"\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": { "id": "U9mJ6Pi3bxCY" }, "source": [ "## Transformers.js Inference\n", "\n", "Now, let's run inference on the PaliGemma2 model using Transformers.js. First, load the model and processor and then prepare inputs (Text query + Image) to run inference and get the output as desired image caption. For reference, you can check the model's page on the Hugging Face model hub under ONNX models section [here](https://huggingface.co/onnx-community/paligemma2-3b-pt-224)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8149e6d989e6" }, "outputs": [], "source": [ "# Show the image from the URL\n", "from PIL import Image\n", "import requests\n", "\n", "url = \"https://jethac.github.io/assets/juice.jpg\"\n", "img = Image.open(requests.get(url, stream=True).raw) \n", "img" ] }, { "cell_type": "markdown", "metadata": { "id": "dbed89fc4a5f" }, "source": [ "It's an image of a cat sitting on a bag, now let's see what the model predicts." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-Gy2LdaY3Iqh" }, "outputs": [], "source": [ "%%writefile index.js\n", "\n", "// Import the required modules\n", "import {\n", " AutoProcessor,\n", " PaliGemmaForConditionalGeneration,\n", " load_image,\n", "} from \"@huggingface/transformers\";\n", "\n", "// Load processor and model\n", "const model_id = \"onnx-community/paligemma2-3b-pt-224\"; // Change this to use a different PaliGemma model\n", "const processor = await AutoProcessor.from_pretrained(model_id);\n", "const model = await PaliGemmaForConditionalGeneration.from_pretrained(\n", " model_id,\n", " {\n", " dtype: {\n", " embed_tokens: \"q8\", // or 'fp16'\n", " vision_encoder: \"q8\", // or 'q4', 'fp16'\n", " decoder_model_merged: \"q4\", // or 'q4f16'\n", " },\n", " }\n", ");\n", "console.log(\"Model and processor loaded successfully!\");\n", "\n", "// Prepare inputs\n", "const url = \"https://jethac.github.io/assets/juice.jpg\";\n", "const raw_image = await load_image(url);\n", "const prompt = \"<image>\"; // Caption, by default\n", "const inputs = await processor(raw_image, prompt);\n", "console.log(\"Inputs prepared successfully!\");\n", "\n", "try {\n", " // Generate a response\n", " const response = await model.generate({\n", " ...inputs,\n", " max_new_tokens: 100, // Maximum number of tokens to generate\n", " });\n", "\n", " // Extract generated IDs from the response\n", " const generatedIds = response.slice(null, [inputs.input_ids.dims[1], null]);\n", "\n", " // Decode the generated IDs to get the answer\n", " const decodedAnswer = processor.batch_decode(generatedIds, {\n", " skip_special_tokens: true,\n", " });\n", "\n", " // Log the generated caption\n", " console.log(\"Generated caption:\", decodedAnswer[0]);\n", "} catch (error) {\n", " console.error(\"Error generating response:\", error);\n", "}" ] }, { "cell_type": "markdown", "metadata": { "id": "Zz81XOKebf_j" }, "source": [ "## Run Application" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "PmVUVFDf3-yE" }, "outputs": [], "source": [ "# Run the node.js application\n", "!node index.js" ] }, { "cell_type": "markdown", "metadata": { "id": "gFgOMvLmcVnY" }, "source": [ "## Conclusion\n", "\n", "Congratulations! You have successfully run inference on PaliGemma2 model using Transformers.js via Node.js environment. You can now integrate this into your projects." ] } ], "metadata": { "colab": { "name": "[PaliGemma_2]Using_with_Transformersjs.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }